Who Reasons in the Large Language Models?

Probing Reasoning Localization in Transformer Architectures with Diagnostic Tools

Published

May 27, 2025

Authors: J. Shao et al. 

Published on Arxiv: 2025-05-27

Link: http://arxiv.org/abs/2505.20993v1

Institutions: National Key Laboratory for Novel Software Technology, Nanjing University, China • School of Artificial Intelligence, Nanjing University, China

Keywords: large language models, transformer, output projection, reasoning, interpretability, fine-tuning, module analysis, diagnostics, mathematical reasoning, Qwen, DeepSeek-R1, attention mechanism, stethoscope for networks, efficient training, modular models

Large Language Models (LLMs) excel at tasks such as reasoning and dialogue, but the specific internal mechanisms that enable their reasoning capabilities after fine-tuning remain poorly understood. There is ongoing debate over whether reasoning arises from specific modules within the Transformer architecture or if it is a distributed or incidental phenomenon.

To address these questions, the authors hypothesize mechanisms underlying reasoning and introduce a novel analytical approach with the following contributions:

Building on the proposed solution and methodology, the authors conduct in-depth experiments and present key findings:

These experimental results lead to a set of important conclusions: